Risk factor targeting for vaccine prioritization during the COVID-19 pandemic

A key public health question during any disease outbreak when limited vaccine is available is who should be prioritized for early vaccination. Most vaccine prioritization analyses only consider variation in risk of infection and death by a single risk factor, such as age. We provide a more granular approach with stratification by demographics, risk factors, and location. We use this approach to compare the impact of different COVID-19 vaccine prioritization strategies on COVID-19 cases, deaths and disability-adjusted life years (DALYs) over the first 6 months of vaccine rollout, using California as a case example. We estimate the proportion of cases, deaths and DALYs averted relative to no vaccination for strategies prioritizing vaccination by a single risk factor and by multiple risk factors (e.g. age, location). When targeting by a single risk factor, we find that age-based targeting averts the most deaths (62% for 5 million individuals vaccinated) and DALYs (38%) and targeting essential workers averts the least deaths (31%) and DALYs (24%) over the first 6 months of rollout. However, targeting by two or more risk factors simultaneously averts up to 40% more DALYs. Our findings highlight the potential value of multiple-risk-factor targeting of vaccination against COVID-19 and other infectious diseases, but must be balanced with feasibility for policy.


COVID-19 case and death data
De-identified individual-level data on 2,215,972 confirmed COVID-19 cases (individuals who tested PCR positive for SARS-CoV-2) in California up to December 30, 2020, was provided by the California Department of Public Health (CDPH). Among these cases, 28,175 individuals died due to COVID-19. The data included information on county of residence, age category, sex, race/ethnicity, date of positive PCR test result, and, where applicable, date of death (see Table S3 for variable definitions). Complete demographic data was available for 1,578,988 (71%) cases and 27,580 (98%) individuals who died.

Overview of COVID-19 simulation model
We simulated COVID-19 deaths, clinical cases, infections and disability-adjusted life-years (DALYs) using the following steps, each of which is described below in further detail: 1. Simulate COVID-19 deaths using Poisson regression model fitted to COVID-19 death data with death hazard rate adjusted by special population status and comorbidity status, and calculate associated DALYs. 2. Estimate infections from predicted deaths using published age-and sex-dependent COVID-19 infection fatality rate (IFR). 3. Estimate clinical cases from predicted infections using published age-dependent clinical fraction for COVID- 19. 4. Calculate DALYs associated with infections and clinical cases.

Calibration of infection fatality rate for California
To account for variation in the IFR across different settings, we recalibrated the age-and sex-dependent ensemble IFR estimate from (9) for California, using a combination of data on confirmed cases and deaths from CDPH up to late October, 2020; seroprevalence estimates for California up to October 8-13, 2020 (from 6 biweekly surveys of approximately 980 individuals), from the CDC's database of nationwide commercial laboratory seroprevalence surveys (10); and age-stratified data on numbers of COVID-19 deaths in California among long-term care facility (LTCF) residents (SNF and ALF residents) and non-LTCF residents provided by CDPH. The IFR estimate from (9)  (accounting for a mean reporting-to-death time of 9 days calculated from the data). We note that this does not account for the fact that some confirmed cases up to October 22, 2020, were asymptomatic and that symptomatic cases were under-reported early in the pandemic. However, these two biases will have acted in opposite directions and information on whether confirmed cases were asymptomatic was unavailable, so we assume that the cumulative number of confirmed cases up to October 22, 2020, provides a reasonable approximation of the cumulative number of clinical cases up to that date. We verified that this approach gave a reasonable match between the age distribution of clinical cases from the model and that of confirmed cases (Figure 2A in main text), and between the age distribution of cumulative infections estimated from the model (2,433,000) and that estimated from the mean seroprevalence estimates for California up to October 8- 13,2020 (1,927,000) (using an approximately 12-day infection-to-reporting delay between the seroprevalence data centred on October 10 and case data calculated from the median onset-to-reporting delay in the data and incubation period (12)) ( Figure 2B in main text). We note that the estimate of the number of individuals infected based on the seroprevalence data is likely to be an underestimate of the true number as it does not account for waning of antibodies (13).
As a further check on this approach, we performed a Poisson regression of cumulative case numbers against demographic risk factors (age, sex, race/ethnicity and county of residence) for both the estimated clinical case counts and confirmed case counts and compared the resulting parameter estimates. Overall, the two sets of parameter estimates agreed closely with each other ( Figure S1). The hazard ratio estimates from the estimated clinical case counts were higher than those from the confirmed case counts for certain counties, possibly suggesting greater under-reporting of cases in these counties and/or higher IFRs, e.g., due to overburdened hospitals. ( I n t e r c e p t ) a g e _ c a t < 1 0 a g e _ c a t 1 0 − 1 9 a g e _ c a t 2 0 − 2 9 a g e _ c a t 3 0 − 3 9 a g e _ c a t 4 0 − 4 9 a g e _ c a t 6 0 − 6 9 a g e _ c a t 7 0 − 7 9 a g e _ c a t 8 0 +  Representing the different risk factors (county, age group, sex, race/ethnicity) by a vector of ( covariates = = >* = , … , * / @ whose value is indexed by the index set (6 = , … , 6 / ), and the total number of COVID-19 deaths and total survival time of individuals with covariate levels (6 = , … , 6 / ) by C 1 ! …1 " and D 1 ! …1 " respectively (such that C 1 ! …1 " /D 1 ! …1 " is the COVID-19 death rate among individuals with covariate levels (6 = , … , 6 / )), the Poisson regression model is: where I 1 ! …1 " is the conditional mean of the Poisson distribution for the cumulative number of deaths, P = (Q = , … , Q / ) is a column vector of the covariate coefficients (such that / A # is the hazard ratio for covariate * 1 ), and O ? is the baseline COVID-19 death rate in California (assumed constant). This model is equivalent to a proportional hazards model for time to COVID-19 death (14,15), with hazard rate: Hazard ratios and their standard errors were estimated using the glm function in R (version 4.0.3) (16). We compared parameter estimates for death data from different periods of time (the last 3 months and 6 months of 2020, and time from the first death to the end of 2020) to assess their stability over time.

Adjustment of COVID-19 death risk by special population status and comorbidity status
We adjusted risk of COVID-19 death by special population status and comorbidity status using published literature estimates for the relative risk of SARS-CoV-2 infection and death from COVID-19 for the special populations and hazard ratios for death given SARS-CoV-2 infection associated with the comorbidities (Table S4). For LTCF residents, we used estimates of age-specific relative risk of COVID-19 death calculated from the CDPH age-and-LTCF-stratified data. For essential workers, we calculated estimates of the relative risk of COVID-19 death for frontline and non-frontline essential workers using estimates of relative risk of death by occupation from a recent analysis of excess mortality in California during the pandemic (17), categorising employment sectors according to the CDC classification of essential workers (7). For each special population R for which data on relative risk of death was not available (healthcare workers, prisoners, education workers, and people experiencing homelessness), we derived an estimate of the relative risk of death, %% D , by multiplying an estimate of relative risk of infection in the special population, %% 14E,D (from published literature or calculated from publicly available data on cumulative incidence of confirmed cases in the special population), by an estimate of the relative risk of death given infection, %% +*FG9|14E,D , derived from the age-and sex-specific IFR, to account for differences in age distributions between special populations: where S:% TTTTT D is the population-weighted average IFR for special population R and S:% TTTTT is the population-weighted average IFR for the general population. We assumed that comorbidities only affected risk of death once infected, and that they had a multiplicative effect on risk of death. The death rate for each demographic risk factor group O 1 ! …1 " was multiplied by the following factor to ensure the overall death risk remained the same (i.e. to account for the higher death risk for individuals in special populations and with comorbidities): is the number of individuals with demographic risk factor levels (6 = , … , 6 / ), %% DH ! …H % is the relative risk of death associated with special population R (where R = 0 represents individuals that are not part of one of the 8 special populations) and comorbidity status (W = , … , W I ) (the particular combination of binary statuses for the 7 comorbidities), and & 1 ! …1 " DH ! …H % is the number of individuals in each demographic-special-populationcomorbidity-status subgroup.

Estimation of number of individuals already infected
We estimated the number of individuals already infected in each demographic risk group by dividing the observed cumulative number of deaths up to December 30, 2020, in each group by the age-and sex-specific recalibrated IFR for that group (see Calibration of infection fatality rate for California). For LTCF residents, we first multiplied the IFR by a frailty index representing increased risk of death given infection among LTCF residents compared to the general population, which we assumed to be 3 based on previous estimates (9,18). The probability of past infection in each age-sex-special-population group was calculated by dividing the estimated number of infected individuals by the group population size, and used to simulate whether or not individuals in the simulated population had been previously infected.

Table S4. Estimates of relative risks of SARS-CoV-2 infection and COVID-19 death for special populations and hazard ratios for death given infection for different comorbidities
Smoker 1 (24-26) * !"(1, 2 ! , 3, 4) = truncated normal distribution with mean 1, non-truncated variance 2 ! and left and right truncation limits 3 and 4. Uncertainty bounds for relative risks of infection reflect a combination of statistical uncertainty (95% CIs for published estimates) and uncertainty due to variation in values from different sources. Death hazard ratios with 95% CI overlapping 1 in original source taken to be 1. ** Further details of the sources and calculations used to derive these values are given in File S1. *** Estimated as mean of relative risks of death for workers in different employment sectors from (17) categorised into frontline vs non-frontline essential work using the CDC classification (7).

Simulation of COVID-19 deaths
To simulate COVID-19 deaths, we first calculated the cumulative probability of death for individuals in the simulated population over 6 months according to their demographic characteristics, special population status, comorbidities, simulated history of infection, and whether or not they were vaccinated. The cumulative probability of death over 6 months (D = 182 days) for a susceptible unvaccinated individual with covariate levels >6 = , … , 6 / @ in special population R with comorbidity status (W = , … , W I ) was: We assumed that the vaccine is 'leaky' and has efficacy `*(a) ∈ [0,1] against death that depends on time since vaccination a, i.e. reduces the relative risk of death at time a after vaccination by a factor (1 −`*(a)). We accounted for waning of vaccine-induced immunity against death by assuming that vaccine efficacy decreases exponentially at a rate 5 with time since vaccination, i.e. `*(a) =`* ? / J0G , where `* ? = 0.95 is the initial vaccine efficacy. Based on an estimated reduction in efficacy of the Pfizer vaccine against death from a maximum of 98.2% 2-9 weeks post vaccination to 90.4% 20+ weeks post-vaccination (27), we used a value of 5 = − ! !)* log>1 − +).*-+../ +).* @ = 0.00045 day J= . The cumulative probability of death over 6 months for a vaccinated individual not previously infected was therefore: Since a number of studies have shown that immunity acquired from previous infection wanes over time (28)(29)(30)(31), we also accounted for potential reinfection and subsequent death amongst previously infected (6) individuals by modeling their cumulative probability of death over 6 months as: where 6 *? is the initial protection against death afforded by previous infection. In other words, we assumed that previous infection offers protection against death that wanes at the same rate as protection from vaccination and made the simplifying assumption that immunity from previous infection only began to appreciably wane at the start of 2021. We use 6 *? =`* ? = 0.95, based on evidence of similar levels of protection against symptomatic infection from prior natural infection and two doses of a viral vector vaccine or mRNA vaccine (32).
The number of past infections, S /F2G,W , and future deaths amongst those not previously infected, C W , and those previously infected, C W,1 , in each demographic-special-populationcomorbidity risk group g = (6 = , … , 6 / , R, W = , … , W I ) was simulated as where ( /F2G,W is the probability of previous infection for individuals in the risk group (estimated as described above). We ran 1000 simulations of past infections and future deaths for the full California population (& = 39,148,760) to account for stochastic uncertainty. We also accounted for uncertainty in the estimated death rate and relative risks of infection for the special populations, by drawing values for log (O 1 ! …1 " ) from truncated normal distributions with bounds and standard deviations derived from the 95% confidence intervals (CIs) for the regression coefficients (Table S6), and values for the relative risks from truncated normal distributions with bounds and standard deviations derived from uncertainty in literature estimates (Table S4).

Estimation of infections and clinical cases
The cumulative number of SARS-CoV-2 infections in each demographic-special-populationcomorbidity risk group was calculated from the simulated number of deaths by dividing by the recalibrated age-and sex-dependent IFR, adjusted for vaccinated individuals to account for different vaccine efficacies against infection and death: where `* { and `* X TTTT are the average vaccine efficacies against death and infection over the first 6 months of the rollout, and S:% W is the IFR for individuals in risk group g given their age and sex. As for the vaccine efficacy against death, we assumed that the vaccine efficacy against infection decreases exponentially over time, `* 1 (a) =`* 1? / J0 # G , but from initial efficacy `* 1? at rate 5 1 = − ! !)* log>1 − +*./-3+.% +*./ @ = 0.0015 day J= (based on an estimated decline in the efficacy of the Pfizer vaccine against symptomatic infection from 92.4% 1 week after vaccination to 69.7% 20+ weeks after vaccination (27)). The average vaccine efficacies are therefore given by: (1 − / J0L ) `* X TTTT =`* 1?
We used `* 1? = 0.85 based on published estimates of two-dose efficacy of the Pfizer vaccine against infection (Table S7 in (33)). We assumed that levels of protection against infection and death from previous infection are the same as those from vaccination, such that the cumulative number of reinfections among previously infected individuals could be calculated as: Numbers of clinical cases were then calculated from estimated numbers of infections by multiplying by the estimated age-dependent clinical case fraction from (11), adjusted for vaccinated individuals to account for vaccine-induced protection against clinical symptoms given infection. In the sensitivity analyses for different vaccine efficacies (see Methods in main text), we assumed the same scaling of vaccine efficacy against infection to vaccine efficacy against death as in the main analysis, i.e. `* 1 /`* = 0.85/0.95 = 0.89.

Calculation of DALYs
DALYs associated with illness and/or death due to a disease are given by the sum of years of life lost (YLL) and years lived with disability (YLD): and thus provide a composite measure of mortality and morbidity. Years lived with disability depend on the duration of illness 0 and the disability weight } ∈ [0,1] associated with the illness (where 0 represents perfect health and 1 represents death): Total DALYs associated with simulated COVID-19 deaths, C$|47 + , were calculated as: i.e. by adding the average remaining life expectancy | W TTT of individuals in risk group g who died, calculated from 2018 US life tables (34) and the age, sex, and race/ethnicity distribution within the risk group, to YLD associated with severe illness from COVID-19, } 2 0 2 , multiplying by the number of deaths in the risk group, C W , and summing over all risk groups. Total DALYs from clinical cases, C$|47 -, and subclinical infections, C$|47 2 , were calculated as: where Sand S are the estimated numbers of clinical cases and overall infections respectively, and } : 0 : and } :1 0 :1 are the YLD associated with clinical and subclinical infection respectively. We used estimates of disability weights for acute episodes of mild, moderate and severe illness from (35) for subclinical infection, clinical infection and death from COVID-19 respectively: } :1 = 0.005, } : = 0.053, } 2 = 0.210. For the corresponding durations of illness 0 :1 , 0 : and 0 2 , we used data on symptom duration stratified by symptom severity from a study of 273 COVID-19 outpatients in Atlanta, Georgia (36) and data from other studies and systematic reviews (37-39) that suggests median durations of symptoms for subclinical infection, clinical infection, and severe illness prior to death are 7 days, 10 days and 18 days respectively.
Approximate 95% uncertainty intervals (UIs) for predicted outcomes were calculated as the 2.5%-97.5% quantiles of the distributions of the simulated outcomes.

Calculation of QALYs
As a sensitivity analysis, we also considered prioritising vaccine allocation under each vaccination strategy (see Methods in main text) to maximize quality-adjusted life years (QALYs) saved instead of DALYs averted. Total QALY losses due to death, $|47 + , were calculated as: where ! W is the quality-adjusted value of the average remaining life expectancy, | W TTT , of an individual in risk group g. The quality-adjusted life expectancy is given by: where F is the quality of life weight for an individual of age Ä, and Ä W TTT is the average age of an individual in risk group g. We used values for the US from Table 3.6 of (40) for F (Table  S5). We calculated total QALYs lost due to clinical and subclinical infections, $|47 -and $|47 2 , as: where 1 is the QALYs lost per clinical case, which we took to be 0.007 (based on values of 0.007 in (41) and 0.0075 in (42)), and the disutility of subclinical infection has been assumed to be approximately 7% of that of clinical infection, in line with its relative disability weight and duration in our DALY calculations. (40)

Vaccine prioritization
For all vaccine prioritization strategies, we assumed that all HCWs and LTCF residents were vaccinated first as per the CDC guidelines for the first phase of the vaccine rollout (44). For the remainder of the population, we calculated the expected average risk of DALYs per person per day for all subgroups under the prioritization strategy, and then ranked subgroups in descending order of DALY risk. Vaccines were then allocated to these subgroups in this order and randomly within each subgroup until the total number of vaccines available, 9 O , was reached. So, e.g., for age targeting, after vaccination of HCWs and LTCF residents, vaccinations were allocated to the age group with the highest average DALY risk and then to age groups with progressively decreasing DALY risk until the vaccine quota was reached. The random allocation of vaccinations within subgroups was varied in each of the 1000 simulations to account for uncertainty from variation in the individuals vaccinated before the vaccine quota was met. Three different initial vaccine quotas were considered (9 O = 2 million, 5 million, 10 million), reflecting variation in initial vaccine availability.
Under special population targeting, essential workers (frontline and non-frontline) were grouped with the section of the population not belonging to any special population. Under essential worker targeting, frontline and non-frontline essential workers were grouped together and the rest of the non-HCW, non-LTCF-resident population was treated as one group, and allocation was assumed to be random within these two groups. For comorbidity targeting, we grouped individuals into those with any comorbidities and those with no comorbidities, and assumed allocation was random within these groups, on the basis that targeting by number of comorbidities would be practically and economically infeasible. Under age-and-special-population targeting, essential workers were included among the special populations targeted for vaccination. In all strategies, excess vaccines remaining after complete target group coverage were randomly allocated amongst the remaining population.

Data and code availability
All analysis code was developed in R version 4.0.3 (16) and is available online at https://github.com/LloydChapman/COVIDVaccineModelling. The CDPH case data required for fitting the Poisson regression model contain personally identifiable information and therefore cannot be made freely available. Individuals interested in accessing the data should contact CDPH. All the data required to run the vaccine prioritization simulations is available at http://doi.org/10.5281/zenodo.4516526.

Regression model parameter estimates
Parameter estimates for the Poisson regression model (Table S6 and Figure S2) show significant variation in COVID-19 death risk across counties, with hazard ratios (HRs) relative to Alameda County varying from 0.38 for Northern California to 3.7 for Imperial County (when fitting to all data since the first recorded death on February 5, 2020). The hazard ratio estimates for age, sex and race/ethnicity reflect the increase in death risk with increasing age, higher death risk for males, and higher death risk for Hispanic/Latino individuals described in the main text. There was some variation in parameter estimates when fitting to different periods of past data (the last 3, 6, and 11 months of 2020), e.g., a few counties went from being lower risk than Alameda County to higher and vice versa, and there was a slight downward shift in the age distribution of deaths over time, but overall parameter estimates were highly consistent (Table S6).

Vaccine availability
The order of impact of the different prioritization strategies on DALYs, deaths and cases was robust to variation in the vaccine availability. However, with 2 million vaccinations instead of 5 million, i.e. only sufficient doses to vaccinate 5% of California's population rather than 13%, none of the prioritization strategies targeting by a single risk factor averted significantly more DALYs than random allocation over 6 months since the vast majority of the 2 million vaccinations were used up in vaccinating HCWs and LTCF residents (~1.85 million individuals in California) in the first phase of the rollout, and prioritization of the remaining 150,000 vaccinations made only a small difference to the numbers of DALYs averted. Prioritizing older individuals averted 18% (95% UI 18-19%) of the overall DALY burden under no vaccination, while prioritizing special populations, individuals with comorbidities, essential workers and random allocation all averted 17-18% (Table S7 and Figure S3E). However, targeting across all risk factors simultaneously averted a higher proportion of DALYs (23%, 95% UI 22-23%). There was little difference in impact on cases between the different strategies -the percentage of cases averted was between 6% for all strategies (Table S7 and Figure S3A) -but allocating the vaccinations remaining after the first phase to older individuals averted more deaths (31%) than allocating them to special populations, individuals with comorbidities, or essential workers (all 28%).
With double the vaccine availability -10 million vaccinations instead of 5 million, i.e., enough vaccine to vaccinate just over a quarter of California's population -age targeting averted by far and away the most DALYs of the strategies targeting by a single risk factor and 62% (95% UI 61-63%) of the simulated overall burden over 6 months (Table S8 and Figure S3F). This was principally due to averting a much higher proportion of deaths than any of the other strategies -79% (95% UI 79-80%) compared to 54% (95% UI 53-56%) for comorbidity targeting as the next best performing strategy (Table S8 and Figure S3D). Special population targeting and essential worker targeting averted only 38% (95% UI 37-40%) and 35% (95% UI 34-36%) of the DALY burden respectively, due to only averting 46% (95% UI 44-47%) and 37% (95% UI 37-38%) of deaths. As for 5 million individuals vaccinated, age-and-county targeting and targeting simultaneously across all risk factors averted a higher proportion of DALYs (68% and 72% respectively) than any of the strategies targeting by a single risk factor. Of strategies targeting by a single risk factor, essential worker targeting averted the highest proportion of clinical cases (26%, 95% UI 22-29%), and age targeting averted the lowest proportion (22%, 95% UI 19-24%). Counties with population <250,000 (except Imperial) were combined into a single region by their economic region, San Benito County was combined with Santa Cruz County, and Napa County was combined with Sonoma County. Plotted hazard ratios for these counties represent the hazard ratio of the combined region.   23 (22)(23) Vaccine impact simulated over 6 months. All strategies assumed that all healthcare workers and long-term care facility residents were vaccinated first as per the CDC recommendation (44).   S a n t a B a r b a r a S a n t a C la r a S a n t a C r u z _ S a n B e n it o S o la n o S t a n is la u s T u la r e V e n t u r a S a n t a B a r b a r a S a n t a C la r a S a n t a C r u z _ S a n B e n it o   25 (24)(25)(26)(27) ii) Special population targeting